Goto

Collaborating Authors

 asymptotic property



Ancestral Inference and Learning for Branching Processes in Random Environments

Jiang, Xiaoran, Vidyashankar, Anand N.

arXiv.org Machine Learning

Ancestral inference for branching processes in random environments involves determining the ancestor distribution parameters using the population sizes of descendant generations. In this paper, we introduce a new methodology for ancestral inference utilizing the generalized method of moments. We demonstrate that the estimator's behavior is critically influenced by the coefficient of variation of the environment sequence. Furthermore, despite the process's evolution being heavily dependent on the offspring means of various generations, we show that the joint limiting distribution of the ancestor and offspring estimators of the mean, under appropriate centering and scaling, decouple and converge to independent Gaussian random variables when the ratio of the number of generations to the logarithm of the number of replicates converges to zero. Additionally, we provide estimators for the limiting variance and illustrate our findings through numerical experiments and data from Polymerase Chain Reaction experiments and COVID-19 data.


Asymptotic Properties for Bayesian Neural Network in Besov Space

Neural Information Processing Systems

Neural networks have shown great predictive power when applied to unstructured data such as images and natural languages. In this paper, we show that the Bayesian neural network with spikeand-slab prior has posterior consistency with a near minimax optimal convergence rate when the true regression function belongs to the Besov space. The spikeand-slab prior is adaptive to the smoothness of the regression function and the posterior convergence rate does not change even when the smoothness of the regression function is unknown. We also consider the shrinkage prior, which is computationally more feasible than the spike-and-slab prior, and show that it has the same posterior convergence rate as the spike-and-slab prior.


Online Identification of Stochastic Continuous-Time Wiener Models Using Sampled Data

Abdalmoaty, Mohamed, Balta, Efe C., Lygeros, John, Smith, Roy S.

arXiv.org Artificial Intelligence

It is well known that ignoring the presence of stochastic disturbances in the identification of stochastic Wiener models leads to asymptotically biased estimators. On the other hand, optimal statistical identification, via likelihood-based methods, is sensitive to the assumptions on the data distribution and is usually based on relatively complex sequential Monte Carlo algorithms. We develop a simple recursive online estimation algorithm based on an output-error predictor, for the identification of continuous-time stochastic parametric Wiener models through stochastic approximation. The method is applicable to generic model parameterizations and, as demonstrated in the numerical simulation examples, it is robust with respect to the assumptions on the spectrum of the disturbance process.


Adaptive Lasso, Transfer Lasso, and Beyond: An Asymptotic Perspective

Takada, Masaaki, Fujisawa, Hironori

arXiv.org Machine Learning

This paper presents a comprehensive exploration of the theoretical properties inherent in the Adaptive Lasso and the Transfer Lasso. The Adaptive Lasso, a well-established method, employs regularization divided by initial estimators and is characterized by asymptotic normality and variable selection consistency. In contrast, the recently proposed Transfer Lasso employs regularization subtracted by initial estimators with the demonstrated capacity to curtail non-asymptotic estimation errors. A pivotal question thus emerges: Given the distinct ways the Adaptive Lasso and the Transfer Lasso employ initial estimators, what benefits or drawbacks does this disparity confer upon each method? This paper conducts a theoretical examination of the asymptotic properties of the Transfer Lasso, thereby elucidating its differentiation from the Adaptive Lasso. Informed by the findings of this analysis, we introduce a novel method, one that amalgamates the strengths and compensates for the weaknesses of both methods. The paper concludes with validations of our theory and comparisons of the methods via simulation experiments.


An active learning method for solving competitive multi-agent decision-making and control problems

Fabiani, Filippo, Bemporad, Alberto

arXiv.org Artificial Intelligence

We propose a scheme based on active learning to reconstruct private strategies executed by a population of interacting agents and predict an exact outcome of the underlying multi-agent interaction process, here identified as a stationary action profile. We envision a scenario where an external observer, endowed with a learning procedure, can make queries and observe the agents' reactions through private action-reaction mappings, whose collective fixed point corresponds to a stationary profile. By iteratively collecting sensible data and updating parametric estimates of the action-reaction mappings, we establish sufficient conditions to assess the asymptotic properties of the proposed active learning methodology so that, if convergence happens, it can only be towards a stationary action profile. This fact yields two main consequences: i) learning locally-exact surrogates of the action-reaction mappings allows the external observer to succeed in its prediction task, and ii) working with assumptions so general that a stationary profile is not even guaranteed to exist, the established sufficient conditions hence act also as certificates for the existence of such a desirable profile. Extensive numerical simulations involving typical competitive multi-agent control and decision-making problems illustrate the practical effectiveness of the proposed learning-based approach. The authors are with the IMT School for Advanced Studies Lucca, Piazza San Francesco 19, 55100, Lucca, Italy ({filippo.fabiani,


A Roadmap to Asymptotic Properties with Applications to COVID-19 Data

Cui, Elvis Han

arXiv.org Artificial Intelligence

A good estimator should, at least in the asymptotic sense, be close to the true quantity that it wishes to estimate and we should be able to give uncertainty measure based on a finite sample size. An estimator with well-behaved asymptotic properties can help clinicians in many ways such as reducing the number of patients needed in a trial, cutting down the budget for toxicology studies and providing insightful findings for late phase trials. Suggested by Sr. Fisher [1], generations of statisticians have worked on the so-called "consistency" and "asymptotic normality" of estimators. The former is based on different versions of law of large numbers (LLN) and the later is based on various types of central limit theorems (CLT) [2]. In addition to these two main tools, statisticians also apply other important but less well-known results in probability theory and other mathematical fields. To name a few, extreme value theory for distributions of maxima and minima [3], convex analysis for checking the optimality of a statistical design [4], asymptotic relative efficiency (ARE) of an estimator [5], concentration inequalities for finite sample properties and selection consistency [6] and other non-normal limits, robustness and simultaneous confidence bands of common statistical estimators [7, 8]. Despite of different properties, consistency and asymptotic normality are still the most celebrated and important properties of statistical estimators in either academia or industry. Hence, in the following, we present a roadmap to consistency and asymptotic normality. Then we provide their applications in toxicology studies and clinical trials using a COVID-19 dataset.


Mixed neural network Gaussian processes

Lindo, Alexey, Papamarkou, Theodore, Sagitov, Serik, Stewart, Laura

arXiv.org Machine Learning

This paper makes two contributions. Firstly, it introduces mixed compositional kernels and mixed neural network Gaussian processes (NGGPs). Mixed compositional kernels are generated by composition of probability generating functions (PGFs). A mixed NNGP is a Gaussian process (GP) with a mixed compositional kernel, arising in the infinite-width limit of multilayer perceptrons (MLPs) that have a different activation function for each layer. Secondly, $\theta$ activation functions for neural networks and $\theta$ compositional kernels are introduced by building upon the theory of branching processes, and more specifically upon $\theta$ PGFs. While $\theta$ compositional kernels are recursive, they are expressed in closed form. It is shown that $\theta$ compositional kernels have non-degenerate asymptotic properties under certain conditions. Thus, GPs with $\theta$ compositional kernels do not require non-explicit recursive kernel evaluations and have controllable infinite-depth asymptotic properties. An open research question is whether GPs with $\theta$ compositional kernels are limits of infinitely-wide MLPs with $\theta$ activation functions.


Phase Transitions in Transfer Learning for High-Dimensional Perceptrons

Dhifallah, Oussama, Lu, Yue M.

arXiv.org Machine Learning

Transfer learning seeks to improve the generalization performance of a target task by exploiting the knowledge learned from a related source task. Central questions include deciding what information one should transfer and when transfer can be beneficial. The latter question is related to the so-called negative transfer phenomenon, where the transferred source information actually reduces the generalization performance of the target task. This happens when the two tasks are sufficiently dissimilar. In this paper, we present a theoretical analysis of transfer learning by studying a pair of related perceptron learning tasks. Despite the simplicity of our model, it reproduces several key phenomena observed in practice. Specifically, our asymptotic analysis reveals a phase transition from negative transfer to positive transfer as the similarity of the two tasks moves past a well-defined threshold. Transfer learning [1]-[5] is a promising approach to improving the performance of machine learning tasks. It does so by exploiting the knowledge gained from a previously-learned model, referred to as the source task, to improve the generalization performance of a related learning problem, referred to as the target task.


Efficient Estimation and Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling

Gronsbell, Jessica, Liu, Molei, Tian, Lu, Cai, Tianxi

arXiv.org Machine Learning

In many contemporary applications, large amounts of unlabeled data are readily available while labeled examples are limited. There has been substantial interest in semi-supervised learning (SSL) which aims to leverage unlabeled data to improve estimation or prediction. However, current SSL literature focuses primarily on settings where labeled data is selected randomly from the population of interest. Non-random sampling, while posing additional analytical challenges, is highly applicable to many real world problems. Moreover, no SSL methods currently exist for estimating the prediction performance of a fitted model under non-random sampling. In this paper, we propose a two-step SSL procedure for evaluating a prediction rule derived from a working binary regression model based on the Brier score and overall misclassification rate under stratified sampling. In step I, we impute the missing labels via weighted regression with nonlinear basis functions to account for nonrandom sampling and to improve efficiency. In step II, we augment the initial imputations to ensure the consistency of the resulting estimators regardless of the specification of the prediction model or the imputation model. The final estimator is then obtained with the augmented imputations. We provide asymptotic theory and numerical studies illustrating that our proposals outperform their supervised counterparts in terms of efficiency gain. Our methods are motivated by electronic health records (EHR) research and validated with a real data analysis of an EHR-based study of diabetic neuropathy.